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Abstract 

Adaptive Monte Carlo methods are recent variance reduction techniques. In this 
work, we propose a mathematical setting which greatly relaxes the assumptions needed 
by for the adaptive importance sampling techniques presented in [24, 23, 1, 2]. We 
establish the convergence and asymptotic normality of the adaptive Monte Carlo esti- 
mator under local assumptions which are easily verifiable in practice. We present one 
way of approximating the optimal importance sampling parameter using a randomly 
truncated stochastic algorithm. Finally, we apply this technique to some examples of 
valuation of financial derivatives. 

1 Introduction 

Monte-Carlo methods aim at computing the expectation E(Z) of a real-valued random 
variable Z using samples along the law of Z . In this work, we focus on cases where there 
exists a parametric representation of the expectation 

E(Z) = E(i?(6',X)) for all 6' G M'^, (1) 

where X is a random vector with values in and : M"^ x i — > M is a measurable 
function satisfying E|iJ(0,X)| < oo for all Q G M'^. We also impose that 

d I — > v(d) = Var(F(6l, X)) is finite for ah B G M'^, (2) 

We want to make the most of this free parameter to settle an automatic variance reduc- 
tion method, see [8] for a recent survey on adaptive variance reduction. It consists in first 
finding a minimiser of the variance v and then in plugging it into a Monte Carlo method 
with a narrower confidence interval. This technique heavily relies on the ability to find a 
parametric representation and to effectively minimise the function v. Many papers have 
been written on how to construct parametric representations X) for several kinds 
of random variables Z. We mainly have in mind examples based on control variates (see 
[4, 13, 12]) or importance sampling (see [24, 23, 1, 2]). We refer the reader to section 4 
for a presentation of a few examples. 
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Assume we have a parametric representation of the form H{9, X) satisfying Equa- 
tions (1) and (2). Let {Xn)n be an independent and identically distributed sequence of 
random vectors following the law of X. Assume we know how to use the sequence {Xn)n 
to build an estimator On of 6* adapted to the filtration J^n = ■ ■ ■ i^n)- Once such 

an approximation is available, there are at least two ways of using it to devise a variance 
reduction method. 

The non-adaptive algorithm 

Algorithm 1.1 (Non adaptive importance sampling (NADIS)). Let n be the number 
of samples used for the Monte Carlo computation. Draw a second set of n samples 
{X[, . . . , X'^) independent of {Xi, . . . , Xn) and compute 

1 " 
n ^-^ 

Since the sequence (0„)n converges to 0*, the convergence of (^n)n to E(Z) ensues from 
the strong law of large numbers and the sequence (^n)n satisfies a central limit theorem 

V^(e„-E(Z))^AA(0,^;r)). 

This algorithm has been studied in [23, 2] and required 2n samples. It may use less than 
2n samples if the estimation of 0* is performed on a smaller number of samples but then 
it raises the question of how many samples to use. 

The adaptive algorithm The adaptive approach is to use the same samples 
(Xi, . . . to compute Qn and the Monte Carlo estimator. Compared to the sequential 
algorithm, the adaptive one uses half of the samples. 

Algorithm 1.2 (Adaptive Importance Sampling (ADIS)). Let n he the number of samples 
used for the Monte Carlo computation. 
For 9q fixed in W^, compute 

n 

(n = -y^H{ei^i,Xi). (3) 

i=l 

Note that the sequence (^j)j can be written in a recursive manner so that it can be 
updated online each time a new iterate Oi is drawn 

Ci+i = -^(i + -^H{9„ Xi+i), with ^0 = 0. 

Being able to update the sequence (^i)j online has the advantage that there is no need 
to store the whole sequence {Xi, . . . ,Xn) for computing This adaptive algorithm was 
first studied in [1] in which the author studied the convergence of the sequence {S,n)n 
under assumptions to be verified along the path {9n)n which makes them hard to check 
in practise. In this article, we prove a new convergence result under local integrability 
conditions on the function H, namely we impose that for any compact subset K of M", 
supggj^ E(|//(0, < oo. We refer the reader to section 2.1 for a precise statement 
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and proof of these results. We want to emphasize that such assumptions only involving 
properties of the function H and not of the sequence {9n)n are far easier to check in 
practice. 

Sofar, we have assumed that we knew how to devise a convergent estimator of 9*, but 
this may not be so simple as when no closed form expression is available for E {H{6,X)), 
there is hardly no chance that the function v can be computed explicitly. Henceforth, it is 
needed to approximate 9* without being able to compute the variance itself. In this work, 
we recall the methodology based on stochastic approximation developed in [23, 24, 2] to 
estimate 9* using some stochastic gradient style algorithms. We aim at applying this 
methodology to the evaluation of financial derivatives and the main difficulty in approxi- 
mating 9* comes from the non-boundedness of the payoff functions usually considered and 
consequently the non-boundedness of the H functions. To encompass this problem, several 
authors as in [24, 23] restrict the parameter 9 to lie in a compact set, which is obviously 
unknown in practice; therefore, this compact set will have to be quite large. Although, 
it permits to prove the theoretical convergence of the Robbins-Monro algorithm it does 
not help to build a numerically convergent estimator of 9*. We all know that the true 
convergence of stochastic algorithms highly relies on the fine tuning of the gain sequence 
which reveals to be very difficult when dealing with an artificially bounded parameter set. 

In this work following [2], we would rather use a randomly truncated algorithm which 
is known to converge for a much wider class of functions. We give a unified framework 
with easily verifiable assumptions under which Algorithm 1.2 converges and satisfies a 
central limit theorem. Then, we combine this convergence result with the new results on 
randomly truncated stochastic algorithm from [16] to revisit the adaptive algorithm in the 
Gaussian framework studied in [1]. 

The paper is organised as follows. In Section 2, we focus on the mathematical founda- 
tion of the method and give both a strong law of large numbers and a central limit theorem 
for the adaptive estimator under weak assumptions. In Section 3, we present one way of 
constructing a convergent estimator of 9* and recall some recent results on stochastic ap- 
proximation. Then, we give in Section 4 some examples of how to construct a parametric 
estimator using importance sampling or other more elaborate transformations. Finally, 
we illustrate the convergence results obtained in Section 2 on numerical examples coming 
from financial problems. 

2 Mathematical foundations of the method 

Notations: 

• We encode any elements of as column vectors. 

• If x E M"*, X* is a row vector. We use the "*" notation to denote the transpose 
operator for vectors and matrices. 

• If X, y G R'", X • y denotes the Euclidean scalar product of x and y and the associated 
norm is denoted by | • |. 

In this section, (X„,)„>i is an i.i.d. sequence following the law of X and we introduce 
the (T— algebra it generates = cr(Xi, . . . , For technical reasons, we assume that 
the variance v does not vanish, i.e. infggKn v{9) > 0. If such is not the case, it means that 
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we are actually in a better situation as far as variance reduction is concerned but it does 
not fit in our framework. 

2.1 An adaptive strong law of large numbers 

Theorem 2.1 (Adaptive strong law of large numbers). Assume Equation (1) and (2) 
hold. Let {9n)n>o be a {J-n) — adapted sequence with values in such that for all n > 0, 
On < oo a.s and for any compact subset K C M'', supgg;^- E(|i/(0, < oo. // 

1 " 

inf v{9) > Q and -"S^ v{9k) < oo a.s., (4) 
eeR'i n ^ 

fe=0 

then converges a.s. to E(Z). 

Proof. For any p > 0, we define Tp = inf{/c > 0;\9k\ > p}- The sequence (Tp)p is an 
increasing sequence of (J^^)— stopping times such that limp_>.oo Tp f c« a.s.. Let M„ = 
Y:'IIo H{9,,Xi+i) - E(Z). We introduce Ml^ = M^^^n defined by 

n-lArp n-1 

n\H{9i,X,+i) - E(Z)|2l^.<,^}) < E(l|,<,^}E(|i7(e,X) - E(Z)|2),=,J. On the set 
{i < Tp}, the conditional expectation is bounded from above by sviY>\e\<p'^{^)- Hence, 
the sequence {M^)n is square integrable and it is obvious that {M^)n is a martingale, 
which means that the sequence (M„)„ is a locally square integrable martingale (i.e. a local 
martingale which is locally square integrable). 

n— 1 n— 1 

(M)„ = ^E((F(0,,X,+i) -E(Z))VO = J^^^(^i)- 

i=0 i=0 

By Condition (4), we have a.s. limsup^ ^(M)„ < oo and liminf„^(M)„ > 0. Applying 
the strong law of large numbers for locally martingales (see [18]) yields the result. □ 

The sequence {9n)n can be any sequence adapted to (X„)„>i convergent or not. For 
instance, {9n)n can be an ergodic Markov chain distributed around the minimizer 9* such 
as Monte Carlo Markov Chain algorithms. 

Remark 2.2. When the sequence {9n)n>o converges a.s. to a deterministic constant 
9oo, it is sufficient to assume that v is continuous at 9oo o-nd v{9oo) > to ensure that 
Condition (4) is satisfied. Note that there is no need to impose that 9oo = 9* although it is 
undoubtedly wished in practice. For instance, 9oo can be an approximation of 9* obtained 
either by heuristic arguments such as large deviations. 

2.2 A Central limit theorem for the adaptive strong law of large numbers 

To derive a central limit theorem for the adaptive estimator we need a central limit 
theorem for locally square integrable martingales, whose convergence rate has been exten- 
sively studied. We refer to the works of Rebolledo [21], Jacod and Shiryaev [7], Hall and 
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Heyde [6] and Whitt [25] to find different statements of central limit theorems for locally 
square integrable cadlag martingales in continuous time, from which theorems can easily 
be deduced for discrete time locally square integrable martingales. 

Theorem 2.3. Assume Equation (1) and (2) hold. Let {9n)n>o be a J-n— adapted se- 
quence with values in R'^ such that for all n > 0, 0n < oo a.s and converging to some 
deterministic value 9o^. Assume there exists r] > such that the function S2+n '■ 9 G 
I — > E [\H{6, X)\'^~^^) is finite for all 6 £ and continuous at Oao- Moreover, if v is 

continuous at O^o and v{6oo) > 0, then, ^/n{^n — '^{Z)) -— > AA(0, u(^oo))- 

Proof. We know from the proof of Theorem 2.1 that M„ = ^^=o H{9i, Xij^i) — E(Z) is a 
locally square integrable martingale and that ^{M)n converges a.s. to v{9oo)- 

n-1 / n-1 \ 

-^E(|if(0„X,+i) -E(Z)|2+^|J-0 < c - J] 52+^,(0,) +E(Z)2+'? . 

^ i=0 \ i=0 / 

The term on the r.h.s is bounded thanks to the continuity of S2+r) at ^oo- Hence, the local 
martingale {Mn)n satisfies Lindeberg's condition. The result ensues from the central limit 
theorem for locally martingales. □ 

Corollary 2.4 (Effective central limit theorem with confidence interval). Assume Equa- 
tion (1) and (2) hold. Let {9n)n>o be a J-n— adapted sequence with values in such that 
for all n > 0, 9n < oo a.s and converging to some deterministic value 9oo- Assume there 
exists rj > such that the function : 9 £ 1 — > E [\L{{9, X)\^~^^) is finite for all 
9 £ R<^ and continuous at 9oo. Then, al = \ YJI=q H{9i,Xi+if - H ^ v{9oo). If 
moreover v{9oo) > 0, then ^{Cn - E(Z)) '""^ ) AA(0, 1). 

^" n— !>+oo 

Remark 2.5. Even ifv{9oo) > 0, (T„, may take negative values forn small. This corollary 
is really essential from a practical point of view because it proves that confidence intervals 
can he built as in the case of a crude Monte Carlo procedure. The only difference lies in 
the way of approximating the asymptotic variance. 

The assumptions of Theorem 2.3 are fairly easy to check in practice since they are 
formulated independently of the sequence {9n)n- When ^00 = 9* , which is nonetheless 
not required, the limiting variance is optimal in the sense that a crude Monte Carlo 
computation with the optimal parameter 9* would have lead to the same limiting variance. 
These assumptions are satisfied in the frameworks introduced in Section 4. 

3 Estimation of the optimal variance parameter 

From Theorem 2.1 and Theorem 2.3, we know that if we can construct a convergent 
estimator {9n)n of 9*, the adaptive estimator ^„ is a convergent and asymptotically normal 
estimator of the expectation E(Z). The challenging issue is now to propose an automatic 
way of approximating the minimiser 9* of v{9) = ¥,{H{9,X)'^) — E(Z)^. In the following, 
we will assume that v is strictly convex, goes to infinity at infinity and is continuously 
differentiable. Moreover, we assume that admits a representation as an expectation 

Vv{9) =E{U{9,X)), 
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where U : measurable and integrable function. We could see in the 

examples developed in Section 4 that these conditions are very easily satisfied. Stochastic 
algorithms such as the Robbins Monro algorithm (see [22]) are perfectly well suited to 
estimate quantities defined as the root of an expectation. Because for the applications 
we are targeting we cannot impose that E{\U{e,X)\^) < c(l + |6'|2), the Robbins-Monro 
algorithm will fail to converge and we need a more robust algorithm. This will naturally 
lead us to consider randomly truncated stochastic algorithms as introduced by Chen et 
al. [3]. When dealing with stochastic approximations, the idea of averaging the iterates 
comes out quite naturally to smooth the trajectories, see Section 3.2. 

3.1 Randomly truncated stochastic algorithms 

Let {Xn)j^>i be an i.i.d sequence of random variables following the law of X and (7n)„>i 
be a decreasing sequence of a positive real numbers satisfying 

^7„ = oo and ^7^<oo. (5) 

n n 

The sequence (7n)n is often called the gain sequence or the step sequence. We define the 
cj— field Tn = o'{Xk, k < n). We introduce an increasing sequence of compact sets {Kj)j 
of R'^ 

oo 

[jKn =R'^ and Kn C (6) 

n=0 

Now, we can present the randomly truncated stochastic algorithm introduced in [3], which 
essentially consists in a truncation of the Robbins Monro algorithm on an increasing 
sequence of compact sets. For £ and ao = 0, we define the sequences of random 
variables (6'„)^ and (a,„)^ by 

= On — In+lU {On, Xn+l) , 

' if G On+1 = and a„+i = a„, (7) 

}i O^^i ^ fCa,, On+i = OQ and a„+i = a„ + 1. 

On+i is the new sample we draw, either we accept it and set 9n+i = or we reject it 
and reset the algorithm to when it tries to jump too far ahead in a single step. Note that 
1 is actually drawn along the dynamics of the Robbins Monro algorithm and either we 
accept it as the new iterate or we reject it when the algorithm tries to jump to far ahead 
and in this case we reset the new iterate to ^o- In the following, we write Equation (7) in 
a more condensed form 

On+l = Tk^^ {On - ln+lU{On, ^n+l)) , (8) 

where 7x^^ denotes the truncation on the compact sets K^^ . 

The use of truncations enables to relax the hypotheses required to ensure the conver- 
gence. From the recent results of [16], we can state the following convergence result 

Theorem 3.1. Assume that 

(Al) Vf is continuous and there exists a unique 9* s.t. Vv{0*) = and \/ ^ 9* , 
{Vv{9)\9-0*) > 0. 
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(A2) \/q>0, sup|e|<,E(|[/(0,Z)|2) <oo. 

Then, the sequence {On)n defined by (7) converges a.s. to 0* for any sequence of compact 
sets satisfying (6) and moreover the sequence (an)„ is a.s. finite. 

Note that the assumptions required to ensure the convergence are very weak and are 
formulated independently of the algorithm trajectories, which makes them easy to check. 
Since the variance reduction technique we settle here aims at being automatic in the sense 
that it does not require any fiddling with the gain sequence depending on the function f/, 
it is quite natural to average the procedure defined by Equation (7). 

3.2 Averaging a stochastic algorithm 

This section is based on the remark that Cesaro type averages tend to smooth the be- 
haviour of convergent estimators at least from a theoretical point of view. Such averaging 
techniques have already been studied and proved to provide asymptotically efficient esti- 
mators (see for instance [20], [14] or [19]). 

At the same time, it is well known that true Cesaro averages are not so efficient 
from a practical point of view because the rate at which the impact of the first iterates 
vanishes in the average is too slow and it induces some kind of a numerical bias which in 
turn dramatically slows down the convergence. Combining these two facts has led us to 
consider a moving window average of Algorithm (7). 

In this section, we restrict to gain sequences of the form 7„ = i^^^^-^a with ^ < a < 1 . Let 
r > be the length of the window used for averaging. For ?i > 1, we introduce 

a„(r) = ^ V Oi with p = sup{/c > 1 : /c + r/7fc < n} A n. (9) 
r ^-^ 

i=p 

We use the convention sup0 = +oo. The almost sure convergence of {9n{T))n can easily 
be deduced from Theorem 3.1. The asymptotic normality of the sequence {On{T))n has 
been studied in [15]. The definition of 9n is a little different from the one used in [15] 
because we want to ensure that the sequence {9n)n is adapted to the filtration Fn in view 
of the use of as an estimator of 9* in Algorithm 1.2. 

4 Examples of parametric Monte-Carlo settings 

In this section, we give various examples of cases in which a parametric representation of 
the expectation of interest is available 

E(Z) =E(i/(6',X)). 

In each example, we highlight the strong convexity and the regularity of the function 
9 I — 7- E(//2(0,X)) such that the minimiser 9* is uniquely defined as the one root of 
01 — >Ve^{H^{0,X)). 
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4.1 Importance sampling for normal random variables 

Let G = {Gi, . . . ,Gd) be a d— dimensional standard normal random vector. For any 
measurable function h : M'^ — > M such that E(|/i(G)|) < oo, one has for all 9 G M"^ 

-0-G 



E{h{G)) =E\^e-''-'^- — h{G + 9)j . (10) 

Assume we want to compute E(/(G)) for a measurable function / : M'^ — > M such that 

/(G) is integrable. By applying equality (10) to h = f and h{x) = /^(x)e~^'^~^ ~, one 

-f) r 1^1^ 

obtains that the expectation and the variance of the random variable /(G + 0) e 2 
are respectively equal to E(/(G)) and u (0) — E^(/(G)) where 

t;(0) =E (^/2(G)e-^-^+^ 

The strict convexity of the function v is already known from [23] for instance. For the 
sake of completeness, we prove here a slightly improved version of this result. 

Proposition 4.1. Assume that 

P(/(G) / 0) > 0, (11) 
3e > 0, E(|/(G)|2+^) < 00 (12) 

Then, v is infinitely continuously differentiable and strongly convex. 

Proof. The function 9 1— )■ f'^(G)e^^''^^~ is infinitely continuously differentiable. Since, 

sup |9,./2(G)e-^-«+^| < e^/2(G) (m + (e«^' + e"^^ )) {{{e^'^^" ^ ^-mg^^ 
\e\<M ^ ^ 

where the right hand side is integrable because by Holder's inequality and Equation (12), 
we have G M'^, E (/^(G) e"^'*^) < 00. Lebesgue's theorem ensures that v is continuously 

differentiable with ^v{9) = E (^f{G){9^ - GJ>-^-^+^^ . Higher order differentiability 
properties are obtained by similar arguments and in particular the Hessian matrix writes 

VMO) = E (^/^(G)e-^-^+^)j I + E(^i9-G){9- G)*/^(G)e-^-^+^^ 

The second term in the above equation is a positive semi-definite matrix, hence 

V\{9) > E(/2(G)e^^-^+^) = E(/2(G)e"^-^)E(e^-^) > E(|/(G)|)2. 

Assumption (11) ensures that E(|/(G)|) > 0. Then, the Hessian matrix is uniformly 
bounded from below by the positive definite matrix E(|/(G)|)^I. This yields the strong 
convexity of the function v. □ 

Proposition 4.1 implies that v has a unique minimiser 9* characterised by Vv{9*) = 0, 



i.e. E (0* - G)e-'^ •^+^/^(G) = 
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4.2 Importance sampling for processes 

Equality (10) can actually be extended to the Brownian motion framework using Gir- 
sanov's theorem. Let (Wj,0 < t < T) be a d— dimensional Brownian motion and T its 
natural filtration. For any measurable and J-"— predictable process (0t,O < t < T) such 
that E (e^ -^o W'^dt^ ^ 

E{f{Wt,0 <t<T)) =E(^e-^oO^-'iW^-Uo\(>^\^dtj (^t + Osds,0 <t<T^^ . 

Assume 0t = e W^, for all t E [0, T]. The variance of q-^-'^t-'-^ f[Wt, 0<t<T) writes 
down v{9) - E {f{Wt, 0<t<T)) with 



v{e) = E ( {Wt + et,0<t<T)] . 



A similar result to Proposition 4.1 holds; in particular v is infinitely continuously differ- 
entiable, strictly convex and goes to infinity at infinity. 

For more general processes {9t,0 <t< T), we refer the reader to [17]. 



4.3 The exponential change of measure 

The idea of tilting some probability measure to find the ones that minimises the variance 
is a very common idea which can be also be applied to a wide range of distribution, see 
for instance the recent results of Kawai [11, 10] in which he applied an exponential change 
of measure to Levy processes, also known as the Esscher transform. 

Consider a random variable X with values in M"' and cumulative generating function 
^p{9) = logE(e^-^). We assume that il^{6) < oo for all 9 G W^. Let p denote the density 
of X. We define the density pg by 

pe{x) =p{x)e^-''-^^^\ xGM'^. 

Let X(^) have pe as a density, then 



E(/(X)) =E 



PeiXW) 



The variance of 



v{9) = E 



writes v{9) - E 



2 pjxWf 



with 



¥.(f{Xfe-'- 



Obviously, this change of measure is only valuable as a variance reduction technique if 
can be simulated at approximately the same cost as X. 



Proposition 4.2. Assume that 

3e>0, E(|/(G)|2+^) < oo 
lim pe{x) = for all x in 



(13) 
(14) 



Then, v is infinitely continuously differentiahle, convex and \vca.\e\ ^ooV{9) = oo. 
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Proof. To prove the differentiabihty of f , it suffices to reproduce the first part of the proof 
of Proposition 4.1. The convexity of v comes from the log-convexity of -0. Moreover, 

v{e) = E (/(X)2e-^-^+^W) = E (/(^)'^) 
Combining Equation (14) with Fatou's Lemma yields that lim|5i| v{9) = oo. □ 

Remark 4.3. If X is a random standard normal vector, pg{x) = p{x — 9) and X^^^ is 
a random normal vector with mean 6 and identity covariance matrix. Hence, we recover 
Equality (10). 

5 Application to the Gaussian random vector framework 
5.1 Presentation of the problem 

We consider a D— multidimensional local volatility model in which each asset is supposed 
to be driven by the following dynamics under the risk neutral measure. 

dSl = Siirdt + ait, SI) ■ dWi), 5^ = s\ 

is a vector of correlated standard Brownian motions. The covariance 
structure of the Brownian motions is given by {W, W)t = where T is a definite positive 
matrix with a diagonal filled with ones. In our numerical examples, we take Tij = l{i=j} + 
p^lijLj} with p G {^^, 1) to ensure that the matrix T is positive definite. The function a is 
the local volatility function, r is the instantaneous interest rate and the vector (s^, . . . , s^) 
is the vector of the spot values. In this model, we want to price path-dependent options 
whose payoffs can be written as a function of {St,t < T). Hence, the price is given 
by the discounted expectation e"''"^ E(V'(5'(, t < T)). Most of the time, this expectation 
must be computed by Monte Carlo methods and one has to consider an approximation of 
'^{St,t < T) on a time grid = to < < " " " < ^A^ = ^- Then, the quantity of interest 
becomes 

The discretisation of the asset S can for instance be obtained using an Euler scheme, 
which means that the function tp can be expressed in terms of the Brownian increments 
or equivalently using a random normal vector. These remarks finally turn the original 
pricing problem into the computation of an expectation of the form E(0(G)) where G 
is a standard normal random vector in and (/> : M^^^ i — > M is a measurable and 

integrable function. Using Equation (10), we have for all 9 € W^, 

E(<^(G)) =E(^0(G + A0)e-^^-^-^^ , (15) 

where A \s d y. ND matrix. The particular choice d = ND and A = corresponds to 
Equation (10). When D = 1, the choice d = 1 and A = (-v/ti, \/*2 — ^i, ■ ■ • > \/fiv^-"f/v^)* 
corresponds to adding a linear drift to the one dimensional standard Brownian motion W 
and we recover the Cameron-Martin formula. 

Transformation (15) actually relies on an importance sampling change of measure. 
Other strategies may be applicable such as stratification for instance as it is explained by 
Glasserman et al. in [5]. 
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5.2 Bespoke estimators for the optimal variance parameter 

It ensues from Proposition 4.1, that the second moment 

vi9) = E (^(G + ^0)2e-2^^-«-l^^l') = E (^0(G)2e-^^-G+^^ (16) 

is strongly convex, infinitely differentiable and 

Vv{9) = E (^A*{Ae - G)4>{Gfe-^'^-^+^^ . (17) 

If we apply Equation (10) again, we obtain an other expression for 

Vvi9) = E (^-A*G(j)iG + ^0)2e-2Ae.G+|Ae|2^ _ ^^g^ 

Let us introduce the following two functions 

U^{0, G) = A*{Ae - GmGfe~^'^-^+^, (19) 
C/2(^, G) = -A*G(I){G + A^)2e-2^^-^+l^^l'. (20) 

Using either Equation (17) or (18), we can write Vv{9) = E([/2(6', G)) = E([/i(6', G)) and 
these two functions and fit in the framework of Section 3 and enable to construct 
two estimators of 6* {0^)n and (0^)n following Equation (7) 

Cl = TK^n (^n - 7n+lU\el Gn+l)) , (21) 
= Tk^^ K - ln+lUH9lGn+l)) , (22) 

where G„ is an i.i.d sequence of random variables following the law of G. We also introduce 
their corresponding averaging versions (^"'^n)n and (^^ri)n following Equation (9). Based 
on Equation (15), we define 

H{9, G) = 4>{G + ^0)e-^^-^-^. 

Corresponding to the different estimators of 9* listed above, we can define as many ap- 
proximations of E((/>(G)) following Equation (3) 

^ n ^ n 

n ^-^ n ^-^ 

i=l i=l 

^ n 1 " 

■Cn = - /J-H' (6*1-1, Gi), = -y_]-f^(6'j^-l,Gi), 

n ^ — ^ ti ^ — ^ 



n ^ — ' n 

i=l i=l 



where the sequence Gi has already been used to build the {9n)n estimators. Prom Propo- 
sition 4.1 and Theorems 3.1, 2.1 and 2.3, we can deduce the following result. 

Theorem 5.1. // there exists e > such that E(0(G)^'^^) < oo then, the sequences 

{9\)n, {(^n)n, {(^^n)n CLnd {9'^n)n defined by Equations (7) or (9) converge a.s. to 0* for 
any increasing sequence of compact sets {Kj)j satisfying (6) and the adaptive estimator 
{in)n,{in)ri^{in)n^{in)n Converge to E((/)(G)) and are asymptotically normal with optimal 
limiting variance v{9*). 
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Proof. We only do the proof for (^^)n and {0^n)n as the same ideas can be apphed to {9n)n 
and {6'^n)n- We know from Proposition 4.1, that the function v defined by Equation (16) 
is strongly convex and infinitely differentiable, hence satisfies Assumption (Al). Let 
g > 0. For any 6 satisfying |0| < g, we have 



Using Holder's inequality, it can easily be proved that the expectation on the right hand 
side is uniformly bounded for \9\ < q. Hence Assumption (A2) is satisfied. Therefore, 
and ^.^ both converge to 6*. Let i] > 0, 



E|i7(e,G)r+'' = E |0(G)r+^e 



Using the integrability of (j){G) and Holder's inequality, one can prove that the expectation 
on the .r.h.s is bounded for ^ in a ball. Moreover, combining this with Lebesgue's theorem, 
we obtain that the functions 6 i — > E\H{9,G)\'^ and 9 i — > E|i7(6', are continuous. 
Therefore, the convergence and asymptotic normality of issues from Theorems (2.1) 
and (2.3). □ 

Remark 5.2. Theorem 5.1 extends the result of [2, Theorem 4]- Our result is valid for 
any increasing sequences of compact sets {Kj)j satisfying (6) whereas Arouna needed a 
condition on the compact sets to ensure the convergence of the {9n)n estimators. The 
only condition required is some integrability on the payoff function and nothing has to be 
checked along the algorithm paths, which is a great improvement from a practical point of 
view. 

For the vast majority of payoff functions commonly used, the assumptions of Theo- 
rem 5.1 are always satisfied. 



5.3 Numerical results 

5.3.1 Complexity of the different approximations 

In the introduction, we have presented two different strategies for implementing a variance 
reduction method based on the approximation of the optimal variance parameter. We 
know from Theorem 5.1, that the adaptive and non-adaptive algorithms both converges 
at the same rate and the same limiting variance. Therefore, to decide which one is the 
better, we have to compare their computational costs. In this section, we assume that the 
computational cost of the different algorithms is determined by the number of evaluations 
of the function (p. We will see in the examples later that this assumption is realistic and 
therefore it becomes obvious that the averaging and non-averaging estimators of 9* all 
have the same computational costs when implemented with expertise. 



The non-adaptive algorithm We know from [2, 23] that the sequential algorithm 
converges with a rate of ^Jv{9*)/n if we have 2n samples at hand and want to implement 
the sequential algorithm by using the first n samples for approximating 9* and the last n 
samples for actually computing the Monte Carlo estimator with the previously computed 
approximation of 9* . Whatever approximation of 9* is used, be it {9\)n, {9n)n, {9^n)n or 
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{9^n)n, this algorithm requires 2n evaluations of the function (j) whereas a crude Monte 
Carlo method only evaluates the function (p n times and achieves a convergence rate of 
■\/v(^)Jn^ hence this method only becomes efficient when v{9*) < f(0)/2. 

The adaptive algorithm From Theorem 5.1, we know that the adaptive estimators 
{^n)n, (^n)") (^n)n, (^n)n Converge with the same rate y/v{9*)/n but as we will see it 
they do not have the same computational cost. First, let us concentrate on {£,n)n and 
{tn)n, at each iteration i, the function (p has to be computed twice : once at the point 
Gj+i + 9l (or Gj+i + 9j) to update the Monte Carlo estimator and once at the point 
Gj+i to update 9j_^_^ or 9j_^^. Hence, the computation of or requires 2n evaluations 
of the function (p. Similarly, the computation of requires at each step 2 evaluations of 
the function (j) : one at the point Gj+i + 9f to update the Monte Carlo estimator and one 
at the point Gj+i + 9f to update the stochastic algorithm. So the overall cost is still 2n 
evaluations of the function (p. But looking closely at the computation of {£,n)n 
immediately highlights the benefit of having put the parameter 9 back into the function 
(j) in the expression of Vv : the updates of ^f^^ and 9f_^_l both use the evaluation of the 
function cp at the same point Gj+i + 9f. Hence, the computation of only needs n 
evaluations of the function (p instead of 2n for all the others algorithms. Obviously, the 
computational costs of the different estimators cannot really be reduced to the number 
of times the function cp is evaluated so one should not expect that computing is twice 
less costly than the other estimators but we will see in the examples below that the 
estimator is indeed faster than the others. 

To shortly conclude on the complexity of the different algorithms, be they sequential 
or adaptive, one should bear in mind that all the estimators except (^^)n roughly require 
twice the computational time of the crude Monte-Carlo method. 

5.3.2 Practical implementation 

The choice of using Equations (7) or (9) to build an estimator of 9* becomes really 
important when one has to implement the variance reduction procedure either by using 
Algorithms (1.1) or (1.2). Both the averaging and non-averaging strategies have pros and 
cons. The averaging algorithm theoretically converges a little slower but has a much 
smoother behaviour with respect to the proper adjustment of the gain sequence (7n)n- 
Then, to have a robust estimator — in the sense that the numerical convergence of the 
estimator does not depend too much on the choice of of the gain sequence — the 
averaging procedure proves to be better in practice. The non averaging algorithm should 
converge a little faster even though we do not notice it in practise as the convergence 
oscillates too much and is far more sensitive to the proper choice of the sequence (7n)n- 
Eventually, both algorithms produce very similar results regarding variance reduction; 
the averaging one is easier to tune but requires more computational time. 

In the numerical experiments of this section, we compare the different algorithms on 
multi- asset options. The quantity "Var MC" denotes the variance of the crude Monte 
Carlo estimator computed on-line on a single run of the algorithm. The variance denoted 
"Var (resp. "Var ,^^") is the variance of the ADIS algorithm (see Algorithm 1.2) 
which uses (^^)n (resp. (^^)n) to estimate 9*. These variances are computed using the 
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on-line estimator given by Corollary 2.4. These adaptive algorithms are also compared 
to the sequential strategy described by Algorithm 1.1 denoted by or "^^+MC" 

depending on how 0* is approximated. In all these algorithms, the matrix A introduced 
in Equation (15) is chosen as the identity matrix. When A is not the identity matrix, its 
purpose is to reduce the dimension of the space in which the optimal 9* is searched and 
in such cases the algorithms will be call "reduced". Note that for the comparison to be 
fair between the different strategies, we have used n samples for the adaptive algorithms 
but 2n for the sequential algorithms so that they all satisfy a central limit theorem with 
the rate ^/n. 

Basket options We consider options with payoffs of the form {Yli=i i^^S^ — K)^ where 
[u^ uj'^) is a vector of algebraic weights (enabling us to consider exchange options). 



p 


K 


7 


Price 


Var MC 


Var 


Var |2 


0.1 


45 


1 


7.21 


12.24 


1.59 


1.10 




55 


10 


0.56 


1.83 


0.19 


0.14 


0.2 


50 


0.1 


3.29 


13.53 


1.82 


1.76 


0.5 


45 


0.1 


7.65 


43.25 


6.25 


4.97 




55 


0.1 


1.90 


14.74 


1.91 


1.4 


0.9 


45 


0.1 


8.24 


69.47 


10.20 


7.78 




55 


0.1 


2.82 


30.87 


2.7 


2.6 



Table 1: Basket option in dimension d = 40 with r = 0.05, T = 1, Sq = 50, a* = 0.2, 
= i for alH = 1, . . . , d and n = 100000. 



Estimators 


MC 


e 




CPU time 


0.85 


0.9 


1.64 



Table 2: CPU times for the option of Table 1. 

The results of Table 1 indicate that the adaptive algorithm using an averaging stochas- 
tic approximation outperforms not only the crude Monte Carlo approach but also the 
adapted algorithms using non-averaging stochastic approximation. The better perfor- 
mance of the algorithms using averaging estimators of 9* comes from the better smooth- 
ness of the averaging algorithm (see Equation (9)). Nonetheless, these good results in 
terms of variance reduction must be considered together with their computation costs re- 
ported in Table 2. As explained in Section 5.3.1, we notice that the computational cost of 
the estimator is very close to the one of the crude Monte Carlo estimator because the 
implementation made the most of the fact that the updates of and 9f^^ both need to 
evaluate the function (j) at the same point. Note that, because this implementation trick 
cannot be applied to the adaptive algorithm using an averaging stochastic approxima- 
tion is twice slower. For a given precision, the adaptive algorithm is between 5 and 10 
times faster. 
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Barrier Basket Options We consider basket options in dimension D with a discrete 
barrier on each asset. For instance, if we consider a Down and Out Cah option, the 
payoff writes down (YliLi^^^T ~ -^)+l{Vi<r>, VjXiV, S'j.>L»} where u = {u^, . . . ,uj^) is a 

vector of positive weights, L = (L^, . . . ,L^) is the vector of barriers, K > the strike 
value and t]\f = T. We consider one time step per month, which means that for an 
option with maturity time T = 2, the number of time steps is = 24. From now 
on, we fix = 5. Hence if we use the identity matrix A, the parameter 9 is of size 
N X D = 120. Here, we propose to reduce the dimension of 9 and we will in Table 3 that 
it achieves almost the same variance reduction. Of course the matrix A cannot be chosen 
independently of the structure of the problem. Remember that the vector G actually 
corresponds to the increments of the Brownian motion B with values in R'^ on the grid 
{tk = kT/N, k = 0, . . . ,N). We recall that we can simulate the Brownian motion B on 
the time grid (ifc)^ by using the following equality in law 







Bt2 




BtN^l 




\ Btr, J 














N-1 



tN-2lD 












G 



where Id is the identity matrix in dimension D. If we choose 

\ 

- tiln 

A 



\\/tN - tN-lIoJ 

then the transformation G + A9 corresponds to the transformation (Bt-^ + 9ti,Bt^ + 
9t2, ■ ■ ■ ,Btff + 9tN)* and it reduces the effective dimension of the importance sampling 
parameter to D = 5 rather than DN = 120. 



K 


7 


Price 


Var MC 


Var ^2 


Var |2 


Var 


Var ^2 


Var |2 


Var 




































reduced 


reduced 


reduced 


45 


0.5 


2.37 


22.46 


4.92 


3.52 


2.59 


2.64 


2.62 


2.60 


50 


1 


1.18 


10.97 


1.51 


1.30 


0.79 


0.80 


0.80 


0.79 


55 


1 


0.52 


4.85 


0.39 


0.38 


0.19 


0.24 


0.23 


0.19 



Table 3: Down and Out Call option in dimension / = 5 with a = 0.2, Sq = 
(50,40,60,30,20), L = (40,30,45,20,10), p = 0.3, r = 0.05, T = 2, w = 
(0.2, 0.2, 0.2, 0.2, 0.2) and n = 100 000. 



First, we note from Table 3 that the reduced and non-reduced ADIS algorithm achieve 
almost the same variance reduction. Actually, it is even advisable to reduce the size of 
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Estimators MC ^^2 q2 _^ y^Q 



^2 |2 02 ^ 



reduced reduced reduced 



CPU time 1.86 1.93 3.34 



4.06 



1.89 2.89 3.90 



Table 4: CPU times for the option of Table 3. 



the importance sampling parameter to reduce the noise in the stochastic approximation 
and therefore in the adaptive Monte Carlo estimator. Comparing the columns "Var MC" , 
"Var and "Var points out that when the convergence of the estimator of 9* is too 
slow the first iterates of the adaptive Monte Carlo estimators use wrong values of 9 and 
therefore cannot reach v{9*) whereas if a sequential algorithm is used all the iterates of 
the Monte Carlo estimator use the same and better approximation of 9. We can see in 
Table 3 that the variance of "^2+MC" is half the one of or but its CPU time is twice 
the one of as noted in Table 4. 

The reduced algorithms are a little faster than the non-reduced ones but their real 
advantage is to converge much more stably and to achieve the same variance as "^2+MC" 
but in far less computational time. As in the previous examples, we still observe that the 
estimator "^^ reduced" is faster than the others and has a variance very close to the best 
method which is ''9'^+MC\ 

6 Conclusion 

In this work, we have explained how one could devise an adaptive variance reduction 
method for computing an expectation with a free parameter. Different algorithms have 
been studied both from a theoretical point of view and in practice. Although all the adap- 
tive algorithms satisfy the same central limit theorem, they may behave very differently 
in practice, in particular adaptive algorithms using a non-averaging stochastic approxima- 
tion of the optimal variance parameter can be implemented in a clever way which makes 
them as fast as a crude Monte Carlo approach. Nevertheless, the numerical convergence 
of these stochastic is very sensitive to the tuning of their gain sequence and one way to 
smooth this behaviour is to plug an averaging procedure on top of the stochastic approx- 
imation but then the computational time significantly increases and yet the dependency 
with respect to the gain sequence is still a serious drawback. To encounter the fine tuning 
of the algorithm, Jourdain and Lelong [9] have recently suggested to use deterministic 
optimisation techniques coupled with sample approximation, but their technique can not 
be implemented in an adaptive manner which increases its computational cost. 
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